Beware of risk for increased false positive rates in genome-wide association studies for phenotypic variability

نویسندگان

  • Xia Shen
  • Örjan Carlborg
چکیده

Performing genome-wide association studies (GWAS) to identify genes regulating the between-genotype variability, rather than the mean, is a new promising approach for dissecting the genetics of complex traits. Using this strategy, Yang et al. (2012) successfully identified and replicated the FTO locus and showed that it has a role in regulating the between-genotype variance heterogeneity of human body mass index using a parametric regression model. This finding illustrates the potential clinical contribution of this type of inheritance and that it is not only a feature of model organisms (e.g., Queitsch et al., 2002; Sangster et al., 2008; Gangaraju et al., 2011; JimenezGomez et al., 2011; Christine et al., 2012; Shen et al., 2012). As it is likely that this paper will increase the interest for applying this methodology in other human and experimental populations, we think that it is important to make prospective users aware that one need to be careful when applying similar methodology to smaller datasets than those used by Yang et al. Yang et al. (2012) noticed that the mapping of variance-controlling loci is prone to inflated test statistics when the minor allele frequency (MAF) is small, but provided no further explanation for this. Here, we will briefly explain why such observation is only half true and why GWAS analyses to detect variance heterogeneity is inherently sensitive to unbalanced data, and why researchers aiming to perform similar analyses need to be careful to avoid reporting false positive signals. The basis for the sensitivity of varianceheterogeneity GWAS analyses is that the commonly applied statistical tests for variance heterogeneity, including e.g., regression using the squared Z-score, the Levene test (Levene, 1960) and the Brown– Forsythe test (Brown and Forsythe, 1974), are biased when applied to imbalanced samples. The major reason for this is that the distribution of the variance often deviates from normality as it: (1) is bounded at zero; (2) has a distribution skewed to the right; (3) has a variance depending on its mean. Such deviations leads to violations of, e.g., the Gauss–Markov assumptions in a regression model (Plackett, 1950), which could cause problems such as those highlighted here. This bias is usually not discussed in the standard statistics literature as it appears only when the samples are severely imbalanced and is not sufficiently strong to be of importance when the tests are used in situations without excessive multiple-testing. GWAS analyses, however, goes well beyond normal statistical theory by doing hundreds of thousands to millions of tests in severely imbalanced samples. As we will show below, these situations could lead to problems with type I errors, even when stringent Bonferroni-corrected thresholds are used, unless caution is taken in the design of the study and in the quality control of the results. To illustrate this inherent problem in the statistical methodology used to test for variance heterogeneity, we used simple simulations in two populations: one with two genotypes: AA and BB and one with three genotypes: AA, AB, and BB. In the simulations, the number of individuals in the minor genotype class (NMG) was varied in populations of increasing sizes. Phenotypes were simulated as pure noise from a standard normal distribution, i.e., all significant signals are false-positives as no genetic effect was simulated. We performed 1,000,000 tests for a variance difference for each combination of population-size and NMG. The number of tests that exceeded the Bonferroni-corrected significance threshold for 1,000,000 independent tests was counted to provide an estimate of the expected number of false positive signals in a genome-scan. As shown in Figure 1A, when there are only two genotype classes, the type I error rate can be very large if the NMG contains fewer than 100 observations when using regression on the squared Z-score, and this cannot be overcome by increasing the total sample-size. The Levene and Brown–Forsythe tests also show such an inflation of false positives (Figure 1B), but use of a Gamma regression model, which accounts for the fact that the squared Z-score follows a chisquare distribution, overcomes this problem. Populations with three genotypes will, in practice, be more robust when the allele substitution model implemented in most GWAS-software is used (i.e., when regression on all three genotypes is used to estimate the additive effect). Inflated type I error rates are then observed only when the intermediate-size genotype class (i.e., in practice most often the heterozygotes) contains fewer than 100 individuals (Figures 1C–E). It should be noted, however, that if the additive genetic effect is estimated as a contrast between the homozygotes (ignoring heterozygotes) or if the dominance effect is included in the model, the bias will be determined by NMG in the same way as when only two

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P83: Role of Neuregulin 3 Genes Expression on Attention Deficits in Schizophrenia

Genetic epidemiological studies strongly suggest that additive and interactive genes, each with small effects, mediate the genetic vulnerability for schizophrenia. With the human genome working draft at hand, candidate gene (and ultimately large-scale genome-wide) association studies are gaining renewed interest in the effort to unravel the complex genetics of schizophrenia. Linkage and fine ma...

متن کامل

The Genetics of Non-Syndromic Primary Ovarian Insufficiency: A Systematic Review

Purpose: Several causes for primary ovarian insufficiency have been described, including iatrogenic and environmental factor, viral infections, chronic disease as well as genetic alterations. Given the large number of genes described in the literature so far, the aim of this review was to collect all the genetic mutations associated with non-syndromic primary ovarian insufficiency. Methods: All...

متن کامل

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

Discovery properties of genome-wide association signals from cumulatively combined data sets.

Genetic effects for common variants affecting complex disease risk are subtle. Single genome-wide association (GWA) studies are typically underpowered to detect these effects, and combination of several GWA data sets is needed to enhance discovery. The authors investigated the properties of the discovery process in simulated cumulative meta-analyses of GWA study-derived signals allowing for pot...

متن کامل

Genome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review

Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...

متن کامل

Riboflavin Lowers Blood Pressure: A Review of a Novel Gene-nutrient Interaction

Hypertension, defined as a systolic/diastolic blood pressure of 140/90 mmHg or greater, is estimated to carry a three-fold increased risk of developing cardiovascular diseases (CVDs). Evidence from genome-wide association studies has identified an association between blood pressure and the gene encoding the folate-metabolising enzyme, methylenetetrahydrofolate reductase (MTHFR). Recent meta-ana...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013